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Abstract — When methods of moments are used for identifica- 
tion of power spectral densities, a model is matched to estimated 
second order statistics such as, e.g., covariance estimates. If the 
estimates are good there is an infinite family of power spectra 
consistent with such an estimate and in applications, such as 
identification, we want to single out the most representative 
spectrum. We choose a prior spectral density to represent a 
priori information, and the spectrum closest to it in a given 
quasi-distance is determined. However, if the estimates are 
based on few data, or the model class considered is not 
consistent with the process considered, it may be necessary to 
use an approximative covariance interpolation. Two different 
types of regularizations are considered in this paper that can 
be applied on many covariance interpolation based estimation 
methods. 

I. Introduction 

Most system identification methods are based on an al- 
gorithm that is proven to give efficients estimates when 
the number of data goes to infinity. One such common 
estimate is the maximum likelihood method. However, in 
many cases only a small amount of data is available and 
the estimation method may give unexpected results. Here 
we will consider methods based on covariance interpolation 
instead. Depending on which model class is considered 
there are a number of different methods around now for 
matching AR, MA, ARMA and other models to covariances, 
such as the ones derived by Lindquist, Byrnes, Georgiou, 
Pavon, Ferrante, et. al. based on minimizing the Kullback- 
Leibler [1], Hellinger [2], the Itakura-Saito quasi-distance 
[3], [4], [5], and other distance concepts. However, also these 
methods depends on the amount of data that is available 
and also structural constraints. The covariances have to be 
estimated from the data and the errors in the estimates will 
increase the smaller the available data set is. Estimating 
the covariances from a short data sequence may generate 
a covariance matrix that is not non-negative definite, or does 
not have a supposed Toeplitz structure or the estimate does 
not correspond to a spectra in the supposed model class. 
So for short data sequences it is necessary to regularize the 
methods to obtain relevant model estimates. In this paper 
we compare two different approaches for dealing with these 
kinds of problems; the two different kinds of regularizations 
are based on quadratic penalties on the covariance estimation 
errors and extra entropy regularization of the determined 
spectrum. These approaches have been used before for the 
maximum entropy method for AR-models, the Kullback- 
Leibler method for the ARMA case with fixed MA-part and 
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a combined covariance and cepstmm interpolation problem, 
but here they will be used and compared in a more general 
setting. 

The first kind of problem, with non-negative definite 
covariance matrices, is often "solved" by using a biased 
estimate of the covariance matrix. This bias is usually small 
and goes to zero as the number of data grows, but for small 
data sets it can be relevant. Another approach is to use 
a regularization of the first kind mentioned above, i.e., to 
find a spectrum within the model class which has a small 
quadratic distance to the estimated covariance matrix. By 
combining the covariance interpolation methods based on 
entropy maximization with a quadratic distance penalty the 
structure of the spectrum is taken into account when the best 
covariance sequence close to the estimates is determined. 

The second kind of problem, with the estimate of the 
covariance matrix not having the supposed structure, is often 
solved using a projection onto the class of matrices with 
the desired structure. This problem is most obvious when a 
state-covariance interpolation approach is used; Then there 
is an imposed structure determined by the {A, b) matrices in 
the state-covariance definition. Again, another approach is to 
use the regularization of the first kind mentioned above. A 
small distance to a matrix with the desired structure is then 
obtained. 

The third kind of problem, with a covariance estimate 
that can not be interpolated by a spectrum in the model 
class (but has the desired structure and is non-negative) as 
in MA-model covariance interpolation for some covariance 
estimates. Probably the most common approach to resolve 
this problem is to project the covariance estimates onto the 
set of covariances feasible for the desired model class. For 
the MA case, this would be the projection onto a positive 
cone, but to avoid having zeros on the unit circle a projection 
to a slightly smaller cone should be performed. Another 
approach is to use a regularization of either the first or second 
kind mentioned above. The amount of quadratic penalty 
regularization for the first method has to be determined 
recursively, and might fail for some cases as will be shown by 
some examples. The extra entropy regularization treats this 
case in an easier way and finds both the best approximating 
valid covariance and the interpolant with one optimization 
problem. 

If we want to determine a MA-model estimate for a state- 
covariance estimated from a short data sequence, all of the 
three kinds of problems described above may occur Then it 
would be necessary to use a combination of the two types 
of regularizations. 

Here, a general approach to the covariance matching 



problem is taken that holds for a large set of different quasi- 
distances and is inspired by the work in [6]. 

II. Background 

Let (. . . , yo, yi, • • •) be a scalar stationary stochas- 
tic real valued mean-zero process with covariances rt = 
E{y£+ky<^} and PSD $. The power spectral density $ rep- 
resents the energy content of the process across frequencies 
and has the covariances as Fourier coefficients, 
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Consider the Hilbert space L2{—'JT,Tr] with the iimer 
product 

(«,&> = ^/" a{e^')b{e-^')de. 

Then the covariances are given by r/t = ($,2;*') .Given a 
finite window of covariances r = ( tq ri ... r„ ) , let 
denote the set of PSD consistent with r, i.e., 

= {$ > I z'') =rk, k = OA,---.n} . 

In this paper, $ > means that this inequality should hold 
on the unit circle, i.e., $(e'^) > for 6 G (-7r,7r]. 

Furthermore, we assume initially that the symmetric 
Toeplitz matrix of the covariances r. 
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is positive definite, hence the set contains an infinite 
number of psds [7, Sec.6.5]. Let 7^ = {r |T(r) > 0}. 

III. Moment Matching 

In many situations it is desired to fit a spectral density 
to data by finding one of a particular structure by matching 
moments. The most common psd used to model stationary 
stochastic processes are the ones that correspond to Moving- 
Average (MA) and Auto-Regressive (AR) processes. Assume 
that Q{z) is a pseudo-polynomial of degree n, i.e., 



Q{z) = go + ^qi{z + z ^) 



i(Z„(z" + z-"). (2) 



Then, $ = Q is the PSD of a MA-process and $ = l/Q is 
the PSD of an AR-process. It is well known that for an AR- 
process the coefficients {qk}k=Q of Q ^'^^ always be tuned so 
that a window of covariances r e 7?. is matched. On the other 
hand, it is also well known that for an MA-process there are 
some Y eTZ (actually open subsets of such covariances) that 
are not matched for any choice of coefficients {ft}^=o- 
both cases there are n -|- 1 parameters that should be tuned 
to match n + 1 constraints, but it is clearly the structure of 
the PSD that determines if solutions exists or not. 
To generalize, let 



define an input-to-state map, where we will assume that A is 
an n X n-stabiUty matrix, fo is an n x 1 vector and {A, B) is a 
reachable pair. Then, for a symmetric matrix we can define 
a generalized pseudopolynomial 



Q{z) := G{z)kG*{z). 



(4) 



Similarly, we generalize to 



To evaluate the properties of different PSD structures, we 
let $ depend on Q, and it will also be allowed to depend 
on some "prior estimate" PSD ^. Assuming now that $ = 
F{Q. 'J'), the moment matching constraint $ e S^e. can be 
expressed as 



E = / GRG* 



J GF{Q, *)G* = ^ = J 
where is an arbitrary function in 



(5) 



IV. Exact and Approximative interpolation 

If we use unbiased estimates of the state-covariances from 
a reahzation with a psd $ we know that the PSDdetermined 
by exact moment matching wiU converge to $ as the number 
of samples tend to infinity, if $ is in the class of spectrums 
considered. 

For short realizations it may be necessary to introduce 
some bias to get reasonable estimates. By introducing bias 
the variance of the estimates can be reduced. How this is 
done is an important issue. 

A. Exact interpolation 

The distance measure will be assumed to be differentiable 
in the first argument, and it will be assumed to be a 
quasi-distance, i.e., it is assumed that £'(<1>||\E') > and 
D{^\\^) = for any pair of PSD $ and Furthermore, 
we assume that 



Note that D is not assumed to be symmetric, convex, to 
satisfy the triangle inequality or be zero if and only if $ = ^t. 
However, these are certainly desired properties. Consider the 
optimization problem, to minimize the distance to ^ for all 
$ e ds, i-e.: 



inf 

4>>0 



s.t. / G^G* - S = 0. . 



(6) 



G{z) := {I-zAy^B. 



(3) 



Note that here that the PSD $ is not constrained to be of a 
certain form, this form will be determined by the optimality 
conditions of the Lagrange relaxed functional, which in turn 
is determined by the geometry imposed by the distance 
measure. 

The optimization problem {V=) has no finite dimensional 
parametrization, but by considering the dual, an optimization 
problem with a finite number of variables is obtained. To this 
end, formal calculations are performed to determine the dual. 



Fonn the Lagrangian function 



Lo($;q) = L'($||*)+tr|A(E-y" G^G" 



and since $ is symmetric (4", Xlfe=o 9^^*) = i^jQi^))' 
where Q is defined in (2). Let R e di: arbitrary. Then the 
Lagrangian function can be written as 

Lo($; Q) = nm^) + tr{AS} - Q) . 

Assuming that a minimizer exists let 

$ := argminLo($,Q), (7) 

this defines the optimal psd as a function of Q, i.e., 

$ = F{Q; (8) 
and determines the dual objective function 

Oo(0: *) = io(^, Q) = LoiFiQ; Q). (9) 

To ensure that the spectral densities F{Q; are non- 
negative the domain Q of feasible Q has to be specified, 
i.e., 

Q = {Q\F{Q:^)>0}. 

This leads to the dual problem to determine the maximizer 
of Clo over all Q S Q, i.e.. 



(I?=) 



sup fio(<3;*,-R) 

Q 

s.t. ^) > 0. 



(10) 



The derivative of Cl is (compare the proof of Proposition 
4.1 in [6]) 

and using that F{Q, minimizes Lq it can be shown that 
the first integral is zero. The stationarity conditions for (X>=) 
are then 

E - / Gi^(Q;*)G* = 0, 



for k = 0, 1, • • • , n, which ensures that for an interior point 
solution the optimal $ e 5Je- 

When the state-covariance S is estimated from a short 
sequence of data, it is quite hkely that the there wiU exist no 
exact interpolants. Even for long data sequences the existence 
of solutions may fail if the given realization does not match 
the class of PSDs considered. 

B. Primal regularization 

Consider now the approximative interpolation problem: 



inf Dm\^)+tT{DWD} 



s.t. 



/G$G* 



D 



(11) 



In this problem we consider not only PSDs in g^s, but any 
PSD and then we penalize deviations from the nominal state- 
covariance E using a quadratic penalty term. An alternative 
approach would be to make a fixed extension of the set 
such as fixed intervals of the parameters in S, that 



approach is taken in [8]. Once again, the spectral density 
is not constrained to be of a certain form, this form will 
be determined by the optimality conditions of the Lagrange 
relaxed functional, which in tum is determined by the 
geometry imposed by the distance measure. 

We show that the structure of the optimal $ will be the 
same as for (■p=). Form the Lagrangian function 

L{^,D;K)=D{^\\^)+tr{DWD] 
+tr i^A{D + i: - j G$G*) 

= Lo(^; q) + tr{DWD} + tr{A£>} 

The optimal D = —^W~^A. So for large W the approx- 
imation errors go to zero (if an exact solution exists). 

The optimal psd $ is again determined by (7), hence the 
structure of $ is preserved and $ = F{Q; '^), see (8). 

The dual objective function is then given by 

Q{Q; = A, Q) = Oo(Q; - ^tv{AW-^A}. 

(12) 



This leads to a dual problem on the form 



sup f2(g;*) 

Q 

s.t. F(Q;*)>0. 



(13) 



The stationarity conditions for {'D%) are 

1 



/ 



GF{Q; *)G* = - {AW-'^ + W-'^A) . 



C. Dual regularization 

Consider the dual regularized optimization problem: 



sup Qo((3;*,i?) + AB(Q) 

Q 

s.t. F{Q; > 0. 



(14) 



where B{Q) is a barrier type of function whos purpose is 
to keep the optimum in an interior point, and regularize the 
solution, i.e. avoid too sharp pikes in the psd. 
The barrier function will typically be a function like 

B,{Q) ^ J log(l + g), 

whose derivative in the direction of the boundary goes to 
infinity as A goes to the boundary, or 

whose function values goes to infinty at the boundary. 
The stationarity conditions are then 

S - I GF{Q; *)G* = A J G^-^-^G* 
- J GF{Q; *)G* = A / G ^ 



and 

S 

respectively. 



(1 + G*AG)2 



G* 



The right hand side will be small for small A. If Q is 
close to zero for some frequencies, the integral will still be 
bounded but have a derivative that goes to infinity as Q goes 
to zero. 

The problem (2?^) is a convex optimization problem and 
could therefore be the dual of some optimization problem, 
but the author has not been succesful in finding such a primal 
problem. For some cases, for example when $ = /Q, the 
extra term in the objective function can be seen to increase 
the entropy of the resulting psd. 

D. Comparison of the two regularizations 

We note that both the regularizations results in adding 
a concave function of Q to the dual objective function. In 
(D^) it is a logaritmic term that works as a barrier function 
making sure that the optimum is in an interior point of Q. 
If the optimum of the primal problem {V=) is in an interior 
point, the regularization term is rather small and does not 
affect the solution much but tends to pull it slightly towards 
a spectrum with psd g{'^). If the optimum of the primal 
problem (7^=) is on the boundary, the unbounded derivative 
of the regularization term wiU push the solution towards the 
interior. 

In iT>%) the regularization term is a quadratic function of 
the matching error. By allowing a slack in the covariance 
matching constraint the distance D{(^\\^) can be made 
smaller and a PSD closer to the prior is obtained. This means 
that more trust is put on the prior information and less is put 
on the covariances, which makes sense if the covariances 
are estimated from short data sequences. For the KuUback- 
Leibler distance it is shown in [9] that even if the covariances 
are not in TZ, i.e. , corresponds to a positive definite Toeplitz 
matrix, an approximative solution is obtained if the a is 
chosen small enough. Note that the covariances can fail to 
correspond to a positive definite matrix and they can also fail 
to form a Toeplitz matrix, but an approximation is anyway 
guarranteed. But this does not hold for any choice of quasi- 
distance, as demonstrated by the following example. 

This example illustrates that the primal regularization may 
not help with the approximation of interpolation data S that 
correspond to the theoretical data of some valid PSDoutside 
the class of PSD used for interpolating. The reason is that 
the quadratic term is increasing for large entries of A, but 
not necessarily when approaching the boundary. 

Example 4.1: Consider now the approximative interpola- 
tion problem {V%) for the special case that d{*^\\^) = 

1 ($--£)2 _ ~ 

2 * 

Form the Lagrangian function 

L(*,A;,)m(i^+„||Af 

n 
fe=0 

The optimal A = — ^g. The optimal psd $ is determined 

by 



for all 6^, i.e. ^ = ^{Q + 1). The dual objective function 
is then given by 

\ I fc=o 

(15) 

Therefore, the regularization term and a only changes the 
prior and no matter how small a is chosen it is not always 
possible to find an interior point solution satisfying the sta- 
tionarity conditions 

for k = 0,1, - ■■ ,n. □ 
The next example illustrates that the dual regularization 
may not help with the approximation of interpolation data 
S that does not correspond to the theoretical data of some 
vahd PSD. The reason is that the barrier function is increasing 
when approaching the boundary, but not necessarily for large 
entries of A. 

Example 4.2: Consider now the approximative interpo- 
lation problem {T>L) for the special case that fig = 
— trjAE} + / ^ logQ, which corresponds to the primal with 
the KuUback-Leibler divergence d{^\\^) = ^ log ^, and 
B{Q) = J\ogQ. 

The objective function is then — tr{AI]} + /(^f + A) log Q, 
which corresponds to the exact interpolation problem with 
prior + A. if E is not a positive semidefinite matrix, no 
matter how large A is, there exists no such exact interpolants, 
and the optimization problem (f^) has no finite optimum. □ 

References 

[1] T.T. Georgiou and A. Lindquist, "KuUback-Leibler approximation of 
spectral density functions," IEEE Transactions on Information Theory, 
vol. 49, pp. 2910-2917, Nov 2003. 

[2] A. Ferrante, M. Pavon, and F. Ramponi, "Hellinger versus KuUback- 
Leibler multivariable spectrum approximation," IEEE Trans. Automatic 
Control, vol. 53, no. 4, pp. 954-967, May 2008. 

[3] J. Shore, "Minimum cross-entropy spectral analysis," IEEE Trans. 
Acoustics, Speecfi and Signal Processing, vol. 29, no. 2, pp. 230-237, 
Apr 1981. 

[4] J. Shore and R. Johnson, "Properties of cross-entropy minimization," 
IEEE Trans. Information Theory, vol. 27, no. 4, pp. 472-482, Jul 1981. 

[5] P. Enqvist and J.Karlsson, "Minimal Itakura-Saito distance and covari- 
ance interpolation," 2008, Conference on Decision and Control. 

[6] P. Enqvist, "Covariance interpolation and geometry of power spectral 
densities," in Proceeding ECC 2009, 2009. 

[7] B. Porat, Digital Processing of Random Signals, Theory & Methods, 
Prentice Hall, 1994. 

[8] C.I. Byrnes and A. Lindquist, New Trends in Nonlinear Dynamics 
and Control, chapter The uncertain generalized moment problem with 
complexity constraint, pp. 267-278, Springer Verlag, 2003. 

[9] P. Enqvist and E. Avvent, "Approximative linear and logarithmic 
interpolation of spectra," Tech. Rep. TRITA-MAT 09 OS 02, KTH 
Mathematics, 2009, ISSN 1401-2294. 



